A Comprehensive Study on Data Extraction in Sina Weibo
نویسندگان
چکیده
With the rapid growth of users in social networking services, data is generated in thousands of terabytes every day. Practical frameworks for data extraction from social networking sites have not been well investigated yet. In this paper, a methodology for data extraction with respect to Sina Weibo is discussed. In order to design a proper method for data extraction, the properties of complex networks and the challenges when extracting data from complex networks are discussed first. Then, the reason for choosing Sina Weibo as the data source is given. After that, the methods for data gathering are introduced and the techniques for data sampling and data clean-up are discussed. Over 1 million users and hundreds of millions of social relations between them were extracted from Sina Weibo using the methods proposed in this paper.
منابع مشابه
Chinese Microblogs and Drug Quality∗
This paper examines the impact of the introduction of Sina Weibo, the most popular microblog in China, on the quality of drugs on the market. Using a unique data set on drug quality and Sina Weibo use, I explore the staggered diffusion of Sina Weibo across prefectures. I find that the number of bad drugs is decreasing in Sina Weibo use: if the Sina Weibo use is doubled, the number of bad drugs ...
متن کاملImpact of Multimedia in Sina Weibo: Popularity and Life Span
Multimedia contents such as images and videos are widely used in social network sites nowadays. Sina Weibo, a Chinese microblogging service, is one of the first microblog platforms to incorporate multimedia content sharing features. This work provides statistical analysis on how multimedia contents are produced, consumed, and propagated in Sina Weibo. Based on 230 million tweets and 1.8 million...
متن کاملTopical differences between Chinese language Twitter and Sina Weibo
Sina Weibo, China’s most popular microblogging platform, is currently used by over 500M users and is considered to be a proxy of Chinese social life. In this study, we contrast the discussions occurring on Sina Weibo and on Chinese language Twitter in order to observe two different strands of Chinese culture: people within China who use Sina Weibo with its government imposed restrictions and th...
متن کاملA New Clustering Model Based on Word2vec Mining on Sina Weibo Users’ Tags
Clustering of Weibo users is one of the most important topics in data mining on social network. Clustering can help dig out the relations among people or between people and resources. A lot of work relating to clustering has been done on analyzing personal relationship, whereas we focus our clustering model on preferences and interests. In this article, we propose a new clustering model focusin...
متن کاملThe Political Economy of Social Media in China
This paper examines the role of Chinese social media in three areas: organizing collective action, surveillance of government offi cials, and propaganda. Our study is based on a data set of 13.2 billion blog posts published on Sina Weibo — the most prominent Chinese microblogging platform — during the 2009-2013 period. We find millions of posts discussing explicit corruption allegations and col...
متن کامل